Hypothesis Testing

The Null

The Null $H_0$ usually is “The World is as it is and my intervention had no effect.” It’s usually the most conservative position you can imagine. Your exertions had No Effect. Here’s some typical Nulls:

Test type	Typical $H_0$	Typical $H_A$
Mean difference	$\mu = 0$	$\mu ≠ 0$
Risk ratio / odds ratio	RR = 1	RR ≠ 1
Hazard ratio (Cox)	HR = 1	HR ≠ 1
Correlation	$\rho = 0$	$\rho ≠ 0$
Regression coefficient	$\beta = 0$	$\beta ≠ 0$

The Alternative Hypothesis $H_A$ is usually “Yeah my shit might have done something to The World” as measured by some estimate. You’re trying to see if you can falsify the Null (thanks Popper et al.)

Hark!

You never accept the Alternative/Research Hypothesis $H_a$ ! Falsifiability FTW! You either reject or fail to reject the Null Hypothesis $H_0$ .

A Bounded Null

Do you ever set $H_0: \mu \ne \mu_0$ … ?

Nope. Think about what you’d say if you rejected the null in $H_0: \mu \ne 5$ . “The mean is exactly 5.” That’s weird statistically and kinda philosophically.

And you’re testing the Null by calculating a sampling distribution under $H_0$ and asking “How surprising is my data?” You answer this question by computing all manner of Test Statistics (e.g. $z = \frac{\bar{x} - \mu}{\frac{\sigma}{\sqrt{n}}}$ ) and then computing the p-Value.

So which value from $\mathbb{Z}$ are you going to plug in that’s not $5$ ? Yep. Each one will give you a different distribution. Just don’t do it.

p-Value

If you want to make a falsifiable claim (thanks Popper) about The World, a p-value is as easy as this:

What is the probability of seeing what I saw in my experiment if the null hypothesis is true?¹

78%? Well that sounds bad. You fail to reject the null. 5%? That’s small. Maybe something’s going on? 0.1%? Okay maybe something’s really going on. “Something” here means association, not causation.

Confidence Intervals

You’ve seen them. “RR = 1.5 95% CI [1.3,1.6]”. What do they mean? Do they mean that you’re 95% sure the true value is somewhere in there?

Nope! Common mistake². You’re saying that if you repeated your experiment several times, your value would ‘wiggle’ each time (different sample, other rando effects) but 95% of the time will be in the interval. That’s all.

Crossing the Null

”Which Test?” TLDR

To pick a test, and generally speaking, you’ll be asking

What is the nature of my Data³? Continuous? Categorical?
How many groups am I dealing with? One, two, or more than two?

Here’s a nice little table from this excellent video (by a Columbia alum!)

	1 Group	2 Groups	2+ Groups
Categorical Data	Proportion Test ( $Z$ -test approx.) $\chi^2$ Test	Proportion Test ( $Z$ -test approx.) $\chi^2$ Test	$\chi^2$ Test
Continuous Data	$Z$ -test & Variants $t$ -test & Variants	$Z$ -test & Variants $t$ -test & Variants	ANOVA ( $F$ -test, 1-way, 2-way)
Classic Assumptions Violated⁴	Sign Test Signed Rank Test	Wilcoxon–Mann–Whitney Test Paired $t$ -test McNemar’s Test	Kruskal–Wallis Test

“The likelihood of obtaining results at least as extreme as the ones observed, assuming that the null hypothesis is true”. I’ve never liked this as a starter definition. ↩
That’s a Bayesian credible interval. ↩
Always seek to understand your Data all the time 🙏 ↩
Too many outliers, small sample size, correlated observations ↩

The Null​

A Bounded Null​

p-Value​

Confidence Intervals​

Crossing the Null​

”Which Test?” TLDR​

Footnotes​